Yeah, so it's a great pleasure to be here. I don't consider myself really in the field
of machine learning, but I have a lot of colleagues in the field, so once in a while I bump into
them and have some conversation, which eventually lead to some of the works here actually. These
are the two. The first is some result about graph neural networks, and second part is a fast
algorithm that we constructed to compute very particular metric, it's less than one distance.
So first part is a group of colleagues in my institute, particular Yu Guang Wang, and second
is a group of Tsinghua University Professor Wu Hao. All right, so in many machine learning
problems you run into data that has certain graph structures. So there you are talking about
a graph with nodes, edges. So the standard data we call them has a Euclidean structure,
which usually takes pixels in regular grids, but there are many applications where you have
to deal with non-Euclidean structure. So here we can see the so-called problem of the node
classifications in the graph neural networks. We assume this one graph where this network has
to use a label the nodes with edge information. So there are a lot of applications where this
kind of thing is important, for example in graph design, with repurposing knowledge,
recommend the system, recommendation system in e-commerce, movie reviews, book reviews,
and even public opinion analysis and so on. So here is the graph neural network, so we have this
graph, assuming it now is an undirected graph, so you have edges and nodes, so you want to have
a network which can reflect these geometric structures of a graph. So here is the typical
graph neural networks, so you start with the else layer, and you try to get to the next layer,
denoted by this superscript L plus one. So here A is the adjacent matrix, which builds
the connectivity between different nodes and edge information. So you write it as I is the identity
matrix, then this D, these D matrices are essentially the, it's called it's layer specific training or
read matrices. Well H is some activation functions, building the matrix of activation
functions in each layer, that's H. And here, well okay, you can use the standard activation
function for example, the loop function. And one can write such a new graph neural neural into a
so-called a message passing system, where you consider each layer from layer L to layer L plus
one, you pass a message from one layer to the next, it's a message passing system. So suppose
you are on the layer, the K minus one, you have the data XJ, and through some activation functions
you move to the next layer. So here you have some operator which corresponding to so-called
node permutation invariant functions. For example, you have sums, means, maximum and so on. Then you
have this other activation function which assume is differential, then you pass to the next layer.
Well there are some standard way to build a graph neural network, there's here two standard
examples, graph convolution network is over here. So here J consider all the nodes which connect to
node I, so you add all these nodes and sigma is the matrix which is a web chain. And there's also
graph attention neural network, so here it is just a matrix AIJ, has something to me similar to what
we heard yesterday about the transformers. So here the Liky-Raloo is basically, the Raloo is
this is X when X is part of zero, and here instead of zero you have a non-zero linear functions when
X is negative. The way to do it is to avoid the vanishing of the gradient, the gradient descent,
you have a zero derivative then you stop there, so try to avoid that, that's the only purpose for this.
Then one of the bottleneck in graph neural network is this over-smoothing issues which
you also heard when we just talk about the transformers. So you have all this data and
eventually all of them form some kind of consensus. If you consider neurons as a particle,
all these particle will go to the same point, which are provided from if you have a face recognition,
Shijin, Professor Wei and so on, and in the end you cannot distinguish different pictures,
so that's called over-smoothing. In particular, this is provided you to use the deep layers,
usually have a very shallow layer, there's no problem, you have deep layers, you have this
over-smoothing issues in a typical graph neural network. How do you characterize this over-smoothing?
A good way to characterize is to use Dirichlet energy. So this Dirichlet energy essentially you
have all these points, you can consider the point as particles, so the distance, and here
is a ij, the matrix, assume it's positive definite, that you normalize. So the over-smoothing
Presenters
Prof. Shi Jin
Zugänglich über
Offener Zugang
Dauer
00:24:05 Min
Aufnahmedatum
2025-04-30
Hochgeladen am
2025-04-30 17:14:18
Sprache
en-US
• Alessandro Coclite. Politecnico di Bari
• Fariba Fahroo. Air Force Office of Scientific Research
• Giovanni Fantuzzi. FAU MoD/DCN-AvH, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Borjan Geshkovski. Inria, Sorbonne Université
• Paola Goatin. Inria, Sophia-Antipolis
• Shi Jin. SJTU, Shanghai Jiao Tong University
• Alexander Keimer. Universität Rostock
• Felix J. Knutson. Air Force Office of Scientific Research
• Anne Koelewijn. FAU MoD, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Günter Leugering. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Lorenzo Liverani. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Camilla Nobili. University of Surrey
• Gianluca Orlando. Politecnico di Bari
• Michele Palladino. Università degli Studi dell’Aquila
• Gabriel Peyré. CNRS, ENS-PSL
• Alessio Porretta. Università di Roma Tor Vergata
• Francesco Regazzoni. Politecnico di Milano
• Domènec Ruiz-Balet. Université Paris Dauphine
• Daniel Tenbrinck. FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
• Daniela Tonon. Università di Padova
• Juncheng Wei. Chinese University of Hong Kong
• Yaoyu Zhang. Shanghai Jiao Tong University
• Wei Zhu. Georgia Institute of Technology